Focus on data pre-processing of AIMS / DIMS data:
- Different software solutions
- Recommendations for handling direct MS towards qualitative and quantitative data
Focus on data pre-processing of AIMS / DIMS data:
Progenesis)xcms)Either a single mass spectrum or acquisition of multipe mass spectra over short timespan.
Plekhova, V., et al. Nature protocols. 2021
Sample measurement leads to a recognisable ‘ambient’ peak. Mass spectra before and after are mainly noise from chemical and instrumental origin.
Plekhova, V., et al. Nature protocols. 2021
Mass peaks of biological origin within the ‘ambient’ peak. However, large variability across scans, negatively influencing reproducibility.
Plekhova, V., et al. Nature protocols. 2021
Mass peaks of biological origin within the ‘ambient’ peak. However, large variability across scans, negatively influencing reproducibility.
sps <- raw %>% filterFile(3) %>% spectra sps.binned <- Spectra::bin(sps[which.max(tic(sps))],binSize=0.01) peaks <- sps.binned %>% peaksData
LA-REIMS data: waters .RAW files with Progenesis QI
Peak detection using cwt algorithm on centroided data.
cwp <- CentWaveParam(peakwidth=c(2,8), noise=100, ppm=50, snthresh=0.5, mzdiff=0.05,
fitgauss=TRUE, extendLengthMSW=TRUE, firstBaselineCheck=FALSE,
prefilter=c(1,100))
chr_1 <- findChromPeaks(chr_1, cwp)
chromPeaks(chr_1)
| rt | rtmin | rtmax | into | intb | maxo | sn | sample |
|---|---|---|---|---|---|---|---|
| 11.154 | 8.109 | 13.183 | 8.6e+06 | 7.0e+06 | 3.1e+06 | 3 | 1 |
| 8.110 | 6.080 | 11.154 | 8.8e+06 | 7.2e+06 | 3.7e+06 | 4 | 2 |
| 12.168 | 9.124 | 14.198 | 1.1e+07 | 8.9e+06 | 4.0e+06 | 3 | 3 |
| 11.154 | 8.109 | 13.183 | 1.2e+07 | 9.0e+06 | 3.7e+06 | 3 | 4 |
cwp <- CentWaveParam(peakwidth=c(2,8), noise=100, ppm=50, snthresh=0.5, mzdiff=0.05,
fitgauss=TRUE, extendLengthMSW=TRUE, firstBaselineCheck=FALSE,
prefilter=c(1,100))
chr_1 <- findChromPeaks(chr_1, cwp)
chromPeaks(chr_1)
| rt | rtmin | rtmax | into | intb | maxo | sn | sample |
|---|---|---|---|---|---|---|---|
| 11.154 | 8.109 | 13.183 | 8.6e+06 | 7.0e+06 | 3.1e+06 | 3 | 1 |
| 8.110 | 6.080 | 11.154 | 8.8e+06 | 7.2e+06 | 3.7e+06 | 4 | 2 |
| 12.168 | 9.124 | 14.198 | 1.1e+07 | 8.9e+06 | 4.0e+06 | 3 | 3 |
| 11.154 | 8.109 | 13.183 | 1.2e+07 | 9.0e+06 | 3.7e+06 | 3 | 4 |
‘Chromatographic peaks’ are not consistently identified, despite good mass peak shape in individual mass spectra.
chr_3 <- findChromPeaks(chr_3, cwp) chromPeaks(chr_3)
Peak detection on the four files yields a total of only 2620 mass peaks.
cwt <- findChromPeaks(cwt, param=cwp) peaks <- as.data.frame(chromPeaks(cwt))
table(peaks$sample)
## 1 2 3 4 ## 489 1005 823 303
After correspondence a total of 655 features are identified.
## Perform the correspondence using fixed m/z bin sizes.
pdp <- PeakDensityParam(sampleGroups = sampleData(cwt)$sample_group,
minFraction = 0.4, bw = 30)
cwt <- groupChromPeaks(cwt, param = pdp)
featureDefinitions(cwt) |> head()The majority of features has 2 to 4 ‘chromatographic’ peaks assigned.
table(featureDefinitions(cwt)$npeaks)
## 2 3 4 5 6 ## 298 169 157 22 9
A minority of features is defined by more than 4 mass peaks.
| mz | mzmin | mzmax | rt | rtmin | rtmax | into | intb | maxo | sn | sample | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CP0420 | 333.0618 | 333.0604 | 333.0641 | 11.154 | 8.109 | 14.198 | 6.8e+04 | 4.2e+04 | 1.8e+04 | 3 | 1 |
| CP0599 | 333.1297 | 333.1277 | 333.1352 | 7.095 | 5.065 | 8.110 | 1.4e+03 | 1.3e+03 | 8.0e+02 | 10 | 2 |
| CP1451 | 333.0641 | 333.0641 | 333.0641 | 9.124 | 6.080 | 12.169 | 7.2e+04 | 4.4e+04 | 1.9e+04 | 3 | 2 |
| CP2231 | 332.9529 | 332.9520 | 332.9557 | 12.168 | 9.124 | 14.198 | 2.6e+04 | 1.4e+04 | 9.2e+03 | 2 | 3 |
| CP2232 | 333.0641 | 333.0641 | 333.0641 | 12.168 | 9.124 | 15.213 | 7.2e+04 | 4.3e+04 | 2.0e+04 | 3 | 3 |
| CP2458 | 332.9518 | 332.9482 | 332.9632 | 11.154 | 8.109 | 16.227 | 2.6e+04 | 1.1e+04 | 7.7e+03 | 1 | 4 |
CP0599: noise signal picked up as chromatographic peak outside of main peak region.
A minority of features is defined by more than 4 mass peaks.
| mz | mzmin | mzmax | rt | rtmin | rtmax | into | intb | maxo | sn | sample | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CP0420 | 333.0618 | 333.0604 | 333.0641 | 11.154 | 8.109 | 14.198 | 6.8e+04 | 4.2e+04 | 1.8e+04 | 3 | 1 |
| CP0599 | 333.1297 | 333.1277 | 333.1352 | 7.095 | 5.065 | 8.110 | 1.4e+03 | 1.3e+03 | 8.0e+02 | 10 | 2 |
| CP1451 | 333.0641 | 333.0641 | 333.0641 | 9.124 | 6.080 | 12.169 | 7.2e+04 | 4.4e+04 | 1.9e+04 | 3 | 2 |
| CP2231 | 332.9529 | 332.9520 | 332.9557 | 12.168 | 9.124 | 14.198 | 2.6e+04 | 1.4e+04 | 9.2e+03 | 2 | 3 |
| CP2232 | 333.0641 | 333.0641 | 333.0641 | 12.168 | 9.124 | 15.213 | 7.2e+04 | 4.3e+04 | 2.0e+04 | 3 | 3 |
| CP2458 | 332.9518 | 332.9482 | 332.9632 | 11.154 | 8.109 | 16.227 | 2.6e+04 | 1.1e+04 | 7.7e+03 | 1 | 4 |
Two distinct mass peaks grouped as a single feature.
Peak detection using MassSpecWavelet package (on raw data). A total of 11057 peaks are identified, for which of 4416 a S/N metric could be calculated.
msw <- MSWParam(scales = c(0.1,0.2,0.4,0.8,1,2,4,8), nearbyPeak = TRUE, winSize.noise =
500, SNR.method = "data.mean", snthresh = 2, ampTh = 0.00005,
peakScaleRange = 2, ridgeLength=24)
raw <- findChromPeaks(single, param=msw)
peaks <- as.data.frame(chromPeaks(raw))Mass peaks are more uniformly detected in all samples.
A total of 2931 features were found in the spectra.
prm <- MzClustParam(sampleGroups = sampleData(raw)$sample_group) raw <- groupChromPeaks(raw, param = prm) features <- featureDefinitions(raw)
table(features$npeaks)
## 2 3 4 ## 558 593 1780
When removing the chromatographic peaks for which no S/N could be calculated, only 1212 features are found.
## 2 3 4 ## 367 366 479
Whether or not a feature is biological or not can be judged on peak shape:
Kumler, W., et al. BMC Bioinformatics. 2023
Brochu, F., et al. Scientific reports. 2019
Brochu, F., et al. Scientific reports. 2019
Brochu, F., et al. Scientific reports. 2019
Brochu, F., et al. Scientific reports. 2019
Brochu, F., et al. Scientific reports. 2019
Brochu, F., et al. Scientific reports. 2019
Brochu, F., et al. Scientific reports. 2019
input <- 'E:/UGent_LIMET/02_Mass_spectrometry/raw_files/saliva_demo' path_msconvert <- 'C:/Program Files/ProteoWizard/ProteoWizard 3.0.24045.2c2c542/ msconvert.exe' raw <- 'E:/UGent_LIMET/02_Mass_spectrometry/mzML_files/raw/saliva_demo' centroided <- 'E:/UGent_LIMET/02_Mass_spectrometry/mzML_files/centroided/saliva_demo'
msconvert(input, centroided, path_msconvert, processed='cwt', mz=0.025, snr=2.0,
filter='absolute', orientation='most-intense', threshold=200, dir=TRUE,
verbose=TRUE)
msconvert(input, raw, path_msconvert, processed='none', dir=TRUE, verbose=TRUE)
cent <- readMsExperiment(spectraFiles = fls) indcs <- scan_selection(cent, method='tic', write=TRUE)
deviation <- calculate_mz_shift(cent, refPoints=ref_pts, indcs=indcs,
shift='loess', dev=c(-150,150))
centroided <- correct_mz_drift(cent, factor=deviation)
composite <- composite_spectrum(raw, indcs, algnPoints=algn_pts, combine='avg',
normalise1='is', normalise2='tic')
composite <- recalibrate_masses(composite=composite, lockmass=215.0327891,
snr=1, q_score=0.5, ppm=3.4e-5)